Dataset of ocean vessel traffic in the North Sea

Automatic Identification System (AIS) is a technology that allows ships to broadcast their position, course, speed, and other information to other vessels or shore-based stations. By collecting and analysing this data, it is possible to create a heatmap of ship activity in a particular region, such as the North Sea. This heatmap acts as a representation of vessel activity per class. A heatmap in a standard geoinformatics format may be preferable to scientific researchers as it would quickly allow users to overlay their own data onto the vessel density layer thus providing spatial context and an ability to compare their dataset to the distribution and intensity of ship activity in a particular region. This dataset represents ocean vessel activity in the North Sea for 2022 and was created using AIS data collected using multiple coastal receivers. The dataset was created from reported vessel positions aggregated both spatially and temporally. The end goal of this data processing is to provide a publicly available spatial layer that can be queried to provide monthly vessel traffic statistics for a region in the North Sea. The data was spatially filtered to only include AIS messages for Latitudes between 49.5 and 53.8 degrees North, and 0.2 and 7 degrees East. The bounding box was chosen as it includes Belgium canals and the Belgium part of the North Sea. The dataset has multiple uses as a collaboration dataset, some example of use-cases that this dataset has been used for include using it asa time-series of statistical priors1 for vessel classes in order to improve vessel classification algorithms and to visualise vessel behaviour in order to locate potential mooring sites where the risk of potential fishing net snags is low. It has also been used to locate areas of potential anchor scarring in anchorages near ports.

a b s t r a c t Automatic Identification System (AIS) is a technology that allows ships to broadcast their position, course, speed, and other information to other vessels or shore-based stations.By collecting and analysing this data, it is possible to create a heatmap of ship activity in a particular region, such as the North Sea.This heatmap acts as a representation of vessel activity per class.A heatmap in a standard geoinformatics format may be preferable to scientific researchers as it would quickly allow users to overlay their own data onto the vessel density layer thus providing spatial context and an ability to compare their dataset to the distribution and intensity of ship activity in a particular region.This dataset represents ocean vessel activity in the North Sea for 2022 and was created using AIS data collected using multiple coastal receivers.The dataset was created from reported vessel positions aggregated both spatially and temporally.The end goal of this data processing is to provide a publicly available spatial layer that can be queried to provide monthly vessel traffic statistics for a region in the North Sea.The data was spatially filtered to only include AIS messages for Latitudes between 49.5 and 53.8 degrees North, and 0.2 and 7 degrees East.The bounding box was chosen as it includes Belgium canals and the Belgium part of the North Sea.
The dataset has multiple uses as a collaboration dataset, some example of use-cases that this dataset has been used for include using it asa time-series of statistical priors 1

Value of the Data
• This dataset is an example of how to aggregate large volumes of AIS data, using open-source tools, into a geospatial product that is publicly available.While there are datasets available that cover the same area, and are released in similar formats [ 2 , 3 ], these datasets have commercial data as an input source.While this data is similar to the EMODnet Human Activities Vessel Density dataset, 2 it used open-source software and open datasets to create a data product that could easily be recreated for areas not covered but the EMODnet layer and allows users to compare multiple overlapping datasets to determine accuracy and suitability for their purposes.• This dataset uses OGC standards to allow researchers to easily compare their own spatial datasets with one that describes human activity in the Belgium part of the North Sea.Some specific examples include comparing GPS bird tracks to fishing vessels, comparing vessel traffic to undersea acoustic sensor networks, and comparing vessel anchorages with benthic floor scarring.• Subsets of this dataset can be requested for specific months or vessel classes, allowing users to use it to map changes in vessel traffic over time or to map class specific locations like fishing grounds or anchorages.• Using this dataset can allow users to highlight areas of high traffic that may have an increased risk of vessels collision [4] .• Vessel traffic is a major contributor to air pollution and a map showing the location, time spent within an area and the class of vessels within that area can assist with modeling air pollution and underwater acoustic noise pollution [5] .The grid is also hexagonal which approximates the propagation of pollution better than a standard rectangular grid.• This dataset can be used as an input feature in an AIS vessel classification algorithm [ 6 ].
It allows users to retrieve statistical priors for specific regions which would provide more context to unlabeled vessel detections and could improve classifier accuracy.

Data Description
The data is a geographic representation of vessel traffic and is accessible by a geographical data server; GeoServer.GeoServer allows users to filter and request the underlying data in many different formats or spatial projections.The bounding box of the data is shown below in Fig. 1 .
GeoServer allows for both Machine to Machine communication and user interaction for data querying/download.Users can constrain the URL above by using other information, e.g.month, year, type, thereby extracting only the information they are interested in, e.g.constrain to month = 'May' and (vessel) type = 'Cargo' and year = 2023 https://geo.vliz.be/geoserver/OpenAIS/wfs?service=WFS&version=1.1.0&request=getFeature&typeName=OpenAIS:vessel _ density&cql _ filter=month=' Jan' %20AND%20type=' Cargo' % 20AND%20year=2023&outputFormat=csv Fig. 2 shows the data represented by a web map and a table showing data for grid cells near the mouse click.The URL used to retrieve this map is https://geo.vliz.be/geoserver/OpenAIS/wms?service=WMS&version=1.1.0&request=GetMap&layers=OpenAIS%3Avessel _ density&bbox=2.0%2C51.4%2C4.2%2C52.7&width=76 8&height=4 85&srs=EPSG%3A4326&styles=&format=application/openlayers While the data can be shown or downloaded in various formats the data can have the fields shown in Table 1 .If the dataset were to be downloaded as a CSV file the dataset would be represented by a table with the columns shown in Table 1 .

Raw data
Streaming AIS data is obtained from AISHub. 3 The data is collected from multiple coastal AIS receivers but without a station ID attached, making it difficult to understand which stations are active and inactive.

Table 1
Example data row for single grid cell.

FID
Geoserver generated Field ID (FID).This is a unique ID for each database row.

vessel_density.fid-5c9490ad_186938bf9cc_-2202 Gid
Geometry ID (GID).This is a unique ID for each cell geometry.This is repeated multiple times in the database for different vessel classes, months etc. 117620

Year
The year that the data was captured in.YYYY format.2022 Month A three letter representation of the month the data was aggregated over.

Nov agg_datetime
The ISO datetime that the data was aggregated over.This is the first day of each month that the data represents and is included to allow the data to be ordered in time.
2022-11-01T0 0:0 0:0 0 The raw data is streamed to the VLIZ server at about 500 hundred messages per second and is then reduced to around 120 messages per second after spatial filtering.
The data streaming in real time from AISHub is decoded, filtered and stored into the Postgres + TimescaleDB + PostGIS database using open-source AIS tools. 4Fig. 3 shows a block diagram describing how vessel location data is moved from on board AIS transmitters to the publicly available dataset through the VLIZ OpenAIS instance.

Data preprocessing
The raw AIS is first decoded into a dictionary of data with fields dependent on what AIS message type was received.The decoder used is "libais" available from PyPi. 5 After decoding the messages that have spatial data, mainly position report messages, are then filtered to remove messages outside the area of interest.Not all messages contain spatial information that can be limited to the North Sea and these are stored without filtering.Messages are then grouped into two categories: position reports and voyage reports.In some cases a single message can contain enough information to be used in both categories.The position reports are inserted into a position report table in the database and the same is done for voyage reports.

Data processing
Once the data is stored in the database more complex aggregations are possible.This dataset is one such example.The data is aggregated over a 1 km2 hexagonal grid in order to create a value for the number of hours per month that each (and all) vessel type(s) spend within the grid cell.A hexagonal grid was chosen for several reasons: • Underwater sound propagation is radial in nature and hexagonal grids approximate the propagation pattern better than square grids do • Determining traffic routing is better when the distance between edges and neighbours are equal.Diagonal routing with square grids is problematic.
The aggregate is calculated by creating a moving window over the AIS data where segments are created from AIS messages and the prior AIS message for each vessel.Data checks are done on all segments and those with a length of 0 (due to duplicate messages) or greater than 0.05 degrees (approximately 5 km) are discarded.The 0.05 degree limit is defined in order to remove large jumps in vessel positions that might cut across land or rivers and be unrepresentative of normal ship behavior.The time differences from consecutive AIS messages are added together for each grid cell and vessel class.For segments that cross multiple grid cells the time is split between grid cells based on the portion of the segment within the cell over the total segment length.
A simplified example is shown below.In this example 3 messages are received from a ship exiting the port of Oostend.The 3 messages are placed into 2 segments with lengths L1 and L2.The time delta of the first segment, t1 -t0, is associated with the grid cell Grid n while the second segment is associated with both cell Grid n and Grid m .The time delta is split between the two cells, by calculating the portion of the segment within each grid cell and apportioning the time delta to the cell relative to the size of the length within the cell.The assumption that segment length is proportional to time spent in a cell breaks down when there are large changes in vessel velocity in a segment.Fortunately the AIS protocol states that large changes in a ship's velocity require more frequent AIS message transmissions.
The methods used to calculate the values for each cell were chosen to remove known issues with the AIS protocol like irregular transmission rates, poor reception far from shore, and duplicate messages from receivers located near each other.Some other heatmap products [ 2,7 ] use AIS message point locations as a proxy for density but this can lead to underestimating density in locations where messages might not be received due to poor receiver coverage.The method chosen here aggregates the line segments created by a time series of messages from vessels and has several advantages over normal point based heatmap aggregations [ 8 ] .
As shown in Fig. 4 the time associated with Grid n is: These aggregations are calculated each day and then averaged into a monthly aggregate at the end of each month.
The vessel class associated with AIS messages received is derived from the "voyage reports" types of AIS messages.This has several classes for vessels as described in Table 2 .

Validation
The data heatmap was compared to the EMODnet Human Activities Vessel Density layer [2] for 2022 for the same region.The comparison was done both spatially and statistically to determine any errors and their location.It must be noted that this layer uses a rectangular grid in the ETRS89-LAEA6 projection while the OpenAIS vessel density layer is hexagonal and, while created with a Belgium Lambert projection (EPSG SRID: 31370 7 ), is stored in EPSG: 3857. 8These differences result in non-overlapping grid cells that will have a significant impact on areas with tight shipping lanes like anchorages, rivers and ports.Fig. 5 shows the difference between the 2022 EMODnet Human Activities Vessel Density layer and the 2022 OpenAIS Vessel Density layer.
Areas shown as blue in the map are areas where the EMODnet Vessel Density layer is reporting higher vessel density than the OpenAIS layer, while red is the opposite.The majority of differences occur in regions of the ocean far offshore or where there are few coastal receivers.There are also significant, sharp, differences in regions where vessel density is tightly grouped, like ports, anchorages and canals.The differences in the tightly grouped regions can partly be attributed to differences in the grids chosen, hexagonal vs rectangular, but this does not explain all the differences noted.
Fig. 6 shows a histogram of the differences for all overlapping pixels: The histogram is slightly tilted towards the positive half indicating that EMODnet typically has a higher pixel value than OpenAIS on average but is also centred around zero indicating a general agreement between the two density layers.

Limitations
The limited amount and range of coastal AIS receivers create several limitations.Vessels beyond the reception range of coastal receivers, or in areas without coastal receivers are not detected.Vessels with class B transceivers are also not detected as well as Class A transceivers as they have a lower RF output power.Class B transceivers are typically carried by smaller vessels like sail boats, fishing and recreational vessels.
There are also several computational limitations on the VLIZ server infrastructure which sometimes can result in timeouts when attempting to visualise the data as a web map.There are plans in place to address the computational limitations but will require time.

Ethics Statement
The primary AIS data used in this dataset complies with the data providers terms of use, specifically; "Every AISHub contributor is allowed to use the data for free".This work meets the requirements for ethical publishing ( https://www.elsevier.com/authors/policies-and-guidelines ).The work does not include chemicals, procedures or equipment that

Fig. 4 .
Fig. 4. Simplified example of time aggregation for a vessel crossing multiple grid cells.
for vessel classes in order to improve vessel classification algorithms and to visualise vessel behaviour in order to locate potential mooring sites where the risk of potential fishing net snags is low.It has also been used to locate areas of potential anchor scarring in anchorages near ports.©2023 The Authors.Published by Elsevier Inc.
a Data collection: AIS data was obtained through the AISHub data sharing network.AISHub aggregates data streams from various sensors.The data was decoded, filtered and inserted into a PostgreSQL database b using open source tools [1] .The sensors used are of various makes and models and there was no ability to determine the source of any specific AIS message.VLIZ installed a COMAR Systems R400N c AIS receiver with VHF whip antenna near the port of Oostende, Belgium.The AIS messages were grouped by vessel ID and class, and ordered in time to create vessel trajectories.These were overlaid onto a pre-generated hexagonal a https://www.ogc.org/standard/wfs/ .b https://www.postgresql.org/ .c https://comarsystems.com/product/r400n-network-ais-receiver-for-coastal-monitoringapplications/ .